Random Indexing Re-Hashed

نویسنده

  • Erik Velldal
چکیده

This paper introduces a modified version of Random Indexing, a technique for dimensionality reduction based on random projections. We here describe how RI can be efficiently implemented using the notion of universal hashing. This eliminates the need to store any random vectors, replacing them instead with a small number of hash-functions, thereby dramatically reducing the memory footprint. We dub this reformulated version of the method Hashed Random Indexing (HRI).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Broadcast Authentication With Hashed Random Preloaded Subsets

We introduce a novel cryptographic paradigm of broadcast authentication with “preferred” verifiers (BAP). With BAP, the message source explicitly targets a set of one or more verifiers. For an attacker, forging authentication data of a source, for purposes of fooling preferred verifiers may be substantially more difficult than fooling other (non-preferred) verifiers. We investigate broadcast au...

متن کامل

Advanced Techniques for Estimating and Re ning Orientation Vectors of Space Object Imagery

We describe three advanced techniques incorporated into the design of a model-based image analysis system which automatically estimates the orientation vector of satellites and their sub-components. The system, implemented in Khoros, operates on images obtained from a ground-based optical surveillance system. Features of each satellite image are rst extracted by partitioning the image and const...

متن کامل

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

Our recent work on large-scale learning using b-bit minwise hashing [21, 22] was tested on the webspam dataset (about 24 GB in LibSVM format), which may be way too small compared to real datasets used in industry. Since we could not access the proprietary dataset used in [31] for testing the Vowpal Wabbit (VW) hashing algorithm, in this paper we present an experimental study based on the expand...

متن کامل

3.5-Way Cuckoo Hashing for the Price of 2-and-a-Bit

The study of hashing is closely related to the analysis of balls and bins; items are hashed to memory locations much as balls are thrown into bins. In particular, Azar et. al. [2] considered putting each ball in the less-full of two random bins. This lowers the probability that a bin exceeds a certain load from exponentially small to doubly exponential, giving maximum load log log n + O(1) with...

متن کامل

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems

The indexing technique in distributed object storage system is the crucial part of a large scale application, where the index data structure may be published in many nodes. Here arises a problem on preserving the privacy of the ownership information while supporting queries on item locations with limited index space. Probabilistic data structure, such as the bloom filter which records the locat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011